-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: extend intrinsic matmul
#951
base: master
Are you sure you want to change the base?
Conversation
Here a few ideas for consideration:
pure function matmul_chain_order(p) result(s)
integer, intent(in) :: p(:)
integer :: s(1:size(p)-1, 2:size(p))
integer :: m(1:size(p), 1:size(p))
integer :: l, i, j, k, q, n
n = size(p)
...
end function matmul_chain_order
pure module function stdlib_matmul (m1, m2, m3, m4, m5) result(e)
real, intent(in) :: m1(:,:), m2(:,:)
real, intent(in), optional :: m3(:,:), m4(:,:), m5(:,:) !> from the 3rd matrix they can be all optional
real, allocatable :: e(:,:)
integer :: p(5), i, num_present
integer :: s(3,2:4)
p(1) = size(m1, 1)
p(2) = size(m2, 1)
num_present = 2
if(present(m3)) then
p(3) = size(m3, 1)
num_present = num_present + 1
end if
if(present(m4)) then
p(4) = size(m4, 1)
num_present = num_present + 1
end if
if(present(m5)) then
p(5) = size(m5, 2)
num_present = num_present + 1
end if
s = matmul_chain_order(p(1:num_present))
...
end function stdlib_matmul For the procedure computing the orders you only need the For the main procedure, you could consider using optional arguments from the 3rd matrix onwards and then manage the rest within the same procedure without recursion. Regarding the example, I would consider much more interesting to set an example a non-trivial example: the actual ordering being different to the naive sequence. Just some food-for-thought. |
Thank you very much @jalvesz for your detailed remarks, they are quite helpful. Regarding the signature I am a bit confused, because if we don't use recursion we would have to handle 14 cases (4th catalan number) if the number of matrices is 5, which would become quite messy quickly. For 3 matrices it is much efficient to compare the two ordering costs and dispatch an ordering appropriately (numpy claims that it is 15 more efficient than calculating a And yes I will add some non-trivial examples for sure. Looking forward to hearing your thoughts. |
Indeed! another idea: For 3 and 4 matrices make them internal procedures including passing as extra argument the ordering slice corresponding to those matrices. For the one public procedure with 2+3optionals, you compute just once the ordering and call the internal versions for 3 and 4 depending on the case. |
Thank you @jalvesz, I have implemented them accordingly and added some better examples although I am not sure of how I may test the correctness of multiplication with large matrices or compare the time taken by native |
In terms of correctness I would say that the simplest approach would be to write down a couple of analytical cases for which the exact solution is fixed, also a couple of cases with random matrices for which the optimal ordering is different from the trivial one and compare: stdlib_matmul vs trivial sequential calls to intrinsic matmul. For integer arrays the error should be zero, for real/complex arrays the error tolerance should be somewhere around In terms of performance, It might not be necessary to add that to the PR but, a separate small program showcasing two scenarios might be useful:
@loiseaujc maybe you have some use cases worth testing? |
Oh boy ! I had not seen this PR. Super happy with it. I'll take some time this afternoon to look at the code and give you some feedback. As for the test cases, anything involving multiplications with a low-rank matrix already factored would do. As far as I'm concerned, these are the typical cases I encounter where such an interface would prove very useful syntactically. I'll see as I get to the lab if I have a sufficiently simple yet somewhat realistic set of data we could use for testing the performances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good progress @wassup05.
n = p(start + 3) | ||
k = p(start + 2) | ||
allocate(temp(m,n)) | ||
call gemm('N', 'N', m, n, k, one_${s}$, m2, m, m3, k, zero_${s}$, temp, m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good progress @wassup05, thank you! Imho this PR is almost ready to be merged. As you suggest, it would be good to have a nice wrapper for gemm
. It has been discussed before. I would like to suggest that all calls to gemm
are also wrapped into a stdlib_matmul
function - now with two matrices only. This would give stdlib fully functional matmul functionality.
Here I suggest two possible APIs, and I will ask @jalvesz @jvdp1 @loiseaujc to discuss that together:
The first would be similar to gemm
! API Similar to gemm
${t1}$ function stdlib_matmul(A, Astate, B, Bstate) result(C)
${t1}$, intent(in) :: A(:,:), B(:,:)
character, intent(in), optional :: Astate, Bstate
and could use the matrix state definitions already in use for the sparse operations
stdlib/src/stdlib_sparse_constants.fypp
Lines 16 to 18 in 5c64ee6
character(1), parameter :: sparse_op_none = 'N' !! no transpose | |
character(1), parameter :: sparse_op_transpose = 'T' !! transpose | |
character(1), parameter :: sparse_op_hermitian = 'H' !! conjugate or hermitian transpose |
The second would be more ambitious and essentially zero-overhead, it would wrap the operation in a derived type: (to be templated of course)
type :: matrix_state_rdp
real(dp), pointer :: A(:,:) => null()
character(1) :: Astate = 'N'
end type matrix_state_rdp
interface transposed
module procedure transposed_new_rdp
...
end interface transposed
type(matrix_state_rdp) function transposed_new_rdp(A) result(AT)
real(dp), intent(inout), target :: A(:,:)
AT%A => A
AT%Astate = 'T'
end function
Then we could define a templated base interface
! Work with normal matrices
${t1}$ function stdlib_matmul(A, B) result(C)
${t1}$, intent(in) :: A(:,:), B(:,:)
! Work with transposed/hermitian swaps
${rt}$ function stdlib_matmul(A, B) result(C)
matrix_state_${rn}$, intent(in) :: A, B
So the user writing code would have it clear:
C = strlib_matmul(A, B)
C = stdlib_matmul(transposed(A), B)
C = stlib_matmul(A, hermitian(C))
we could even make it an operator:
C = strlib_matmul(A, B)
C = stdlib_matmul(.t.A, B)
C = stlib_matmul(A, .h.C)
without it triggering any actual data movement.
This PR attempts to address #931
interface
stdlib_matmul
is created and extended to handle 3-5 matrices. (works forinteger
,real
,complex
)API
The Algorithm for optimal parenthesization is as given in "Introduction to Algorithms" by Cormen, ed4, ch14, section-2.
numpy's
linalg.multidot
also uses the same algorithm.Although as @jalvesz had suggested I should have used
gemm
for the multiplication of component matrices this usesmatmul
for now, I can make this if deemed appropriate once the major implementation has been given a green signal.I have added a very basic example to play around with, and I will be adding the detailed docs, specs and tests once everybody approves of the major implementation.
I am not really happy with some parts of the code like computing the
size
of all the matrices individually, If anyone has any suggestions regarding that or any cleaner way of implementing some other stuff (perhaps somefypp
magic) please do let me knowNotes